Paola Igarteburu, pao.igarteburu@gmail.com
Gabriel Moncarz, gabriel.datamining@gmail.com
Student Team: YES
Did you use data from both mini-challenges? NO
Tableau desktop and public
SQLserver
Python - scipy library
Approximately how many hours were spent working on this submission in total?
50
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES
Video Download
Video: https://www.youtube.com/watch?v=lIsHTTBj6e0
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC1.1 – Characterize the attendance at DinoFun World on this weekend. Describe up to twelve different types of groups at the park on this weekend.
We began by making a bubble plot of check-ins and movement coordinates to match attractions to each XY pair on DinoFun map. Attractions were labeled by ID, the size represents total checkins and the colour the type of attraction. The smaller points represent the registered movements considered as the park’s paths.
We could easily notice that Thrill Attractions were the most popular ones.
Based on movement patterns along the weekend and check-ins by hour, we identified the “time spent inside the park per user” as a key variable for the analysis. In order to start identifying groups, we plotted check-ins and movements per user per timestamp sorting user ids by time spent in the park (descending). At first glance, we could notice that there were some users that had movements and check-ins at the same timestamp.
Then, for the same IDs, we coloured each check-in according to the type of attraction: obtaining patterns in the type of attraction.
These visualizations gave us the idea that not only there were users following a certain pattern but that there were also users moving in groups. This idea is verified in all the days.
To identify which users “move together” in the same group we processed the data again with the following logic:
With this process we found almost four thousand groups. In the following Treemap visualization we can see the amount of ID that “move together” by size of group for each day. For each tile, the first number represented the quantity of IDs in each group and the second one the total groups of that size. We could see that the smallest are the most popular groups: formed by 2, 3 and 4 users. The largest group had 34 users and visited the park during Saturday.
To verify the movement pattern consistency of the identified groups, we plotted each movement per timestamp and highlighted with different coloured shapes the check-ins by type of attraction. In the following visualization we distinguished 3 small groups on top versus a large group pattern in the bottom.
We compared attractions attendance among small (2 and 3 users) and large groups (more than 25 users). As noticed in the visualization, large groups tended to visit less than 5 Rides for Everyone, 3 to 4 Kiddies Rides and 8 to 10 Thrill Rides. Regarding shows and entertainment, the average groups either went or avoided going to a show.
In contrast, small groups had larger deviation: small groups had more flexibility moving around the park whereas large groups tended to be conditioned by their size.
To detect more movement patterns, we used a calculated variable “distance moved”: we estimated the maximum distance traveled by each id within each hour. For the distance we used the euclidean distance between the most extreme XY pairs for each user.
We grouped the distance moved in bins (size of bin=10) and plotted it by timestamp. The bubbles size and colour was given by the total of users moving at each distance. We found that the most frequent distance traveled was between 50 and 70 for all three days. As well as this, we visualized that there were more users travelling that distance during the first half of Sunday.
Re-evaluating the groups by the metric of time spent in the park we found that there are two more groups: users that spent less than 300 minutes inside the park (347 users, half of them visited the park during Sunday) versus users that stayed longer.
Among users that stayed less than 300 minutes, we distinguished two groups: morning and afternoon users. Some of the former went exclusively to visit the Thrill Rides and had several movement points indicating they moved around the park to visit different rides. These were users with determination to make the most of their short stay in the park.
We also wanted to characterize users that didn’t attend any Show (more than 1400). While we were expecting to find people with high frequency on Thrill Rides, we actually found that a large portion were people that decided not to attend any attraction during the afternoon but stayed
wandering inside the park.
.
We wanted to identify groups by attractions. By plotting the results of filtering IDs that have more check-ins at Kiddies attractions we saw that these groups also visited many Thrill Rides. We also noticed that they were still checking-in at attractions until the end of the day, a very active group. We had actually expected to find users that only visited Kiddies Rides that could be considered as families.
Furthermore, we wanted to characterize very “well informed” users. These were the ones that checked in any information facility. We noticed in the following plot how these users actually visited more attractions on average (more than 18 attractions). These were users that made the most out of their day in the park.
.
MC1.2 – Are there notable differences in the patterns of activity on in the park across the three days? Please describe the notable difference you see.
To characterize activity along the different days, we began by plotting only check-in XY pairs by day. The size of the bubbles is the frequency of check-ins registered and the colour identifies the type of attraction.
At first glance, we highlighted:
In the following visualization, we plotted the distribution of check-ins by timestamp and maintained colours by type of attraction. We identified that the frequency of attendance to attractions diminished by hour and that this happened more abruptly during Sunday.
The pattern of behavior indicated that users were more keen on visiting attractions during the morning. After 4 pm the attendance to attractions decreased dramatically.
A clear case of spikes of attendance in the afternoon was on “Shows and Entertainments” during Friday and Saturday. Also, we can observe that users go for information and assistance when they enter the park.
To characterize the movement we plotted XY pairs by day and included frequency by colour and size (scale shown under the plot). The busiest paths in the park coincided with the location of most visited attractions and it also indicated probable bottlenecks in transportation inside the park as busy areas did not show continuous patterns.
Although in the first visualization we could see that Kiddies Rides had almost the same amount of attendance as Rides for everyone, this pattern didn’t reflect on higher intensity in the corresponding walking paths.
MC1.3 – What anomalies or unusual patterns do you see? Describe no more than 10 anomalies, and prioritize those unusual patterns that you think are most likely to be relevant to the crime.
In our first approach we decided to analyze distribution of movement (in percentage) and check-in for attractions near the crime zone for each hour of each day. For this, we used a bar chart.
We didn’t find anything abnormal related to percentage of movements versus check-ins in attractions near the crime zone. The last check-ins at Creighton Pavillion were at 11 am, a possible time window for the accident. Movements didn’t stop, which means people were still passing by but no-one was accessing.
While analyzing group behavior we noticed that users staying less than 300 minutes inside the park during Sunday only visited the park during the morning. We didn't find a specific trend of attractions attended although we could see some users moving in groups and even some users not showing any check-ins. A possible reason for this behaviour is because the park might have closed after the accident.
We zoomed users that didn’t show any check-ins inside attractions, lifting the restriction of 300 minutes. There were only 65 users and only 21 of them visited the park during Sunday. We plotted movement pattern of users that didn’t register any check-in and we identify with an X the movement registers that belong to Creighton Pavilion. We identify 3 users that visited Creighton Pavilion but didn’t check-in during the morning and left at noon: 47411, 159893, 921888.
Next, we plotted the movement pattern inside the park for those 3 ids. When visualizing the movement by minute we could noticed that the most suspicious movements came from ids in red and green: they approach the crime area directly after entering the park.
This visualization can also be found in the following Tableau Public link: goo.gl/a7XTQd
Among other anomalies, we noticed that there were extreme users that moved very short distances or large distances within each hour.The following visualization shows the traveled trajectory for each day for the user that had largest average distance (in red) and minimum average distance (in blue).
Users with minimum average distance on Friday moved around the north-west area, for the other days they actually covered a larger portion of the park. It is interesting to see that user with maximum average distance covered during Sunday actually had fewer check-ins than the user on Friday: meaning that he moved a lot but wasn’t really thrilled with the attractions.
Finally, to check for anomalies in the detector we extracted the first and last movement of each ID for each day. We detected an anomaly for Saturday and Sunday: we found that during those days the movement detector identified XY pairs that were not Entrance/Exit or Information & Assistance.
For the last movement for each id in the park, we found that during Friday the movement detector identified as last spot of most of the users an XY pair that was not Entrance/Exit or Information points. This anomaly seems to have been corrected for Saturday and Sunday as seen in the following visualization.